[Doc] Add Qwen3-235B tutorial #4358

JC-ut0 · 2025-11-22T09:09:48Z

What this PR does / why we need it?

Add Qwen3-235B tutorial including the following examples

Single-node Online Deployment for 128k context inference
Multi-node Deployment with MP

Does this PR introduce any user-facing change?

How was this patch tested?

vLLM version: v0.12.0
vLLM main: https://github.com/vllm-project/vllm/commit/v0.12.0

github-actions · 2025-11-22T09:09:56Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request adds a new tutorial for running the Qwen3-235B model. The documentation is well-structured and provides good detail. I've found a couple of critical typos in model names within commands that would cause them to fail, and a potentially confusing or incorrect configuration for cudagraph_capture_sizes. I've left specific comments with suggestions to fix these issues.

docs/source/tutorials/Qwen3-235B.md

docs/source/tutorials/Qwen3-Moe.md

docs/source/tutorials/Qwen3-235B.md

docs/source/tutorials/Qwen3-235B-A22B.md

Signed-off-by: xuyexiong <[email protected]>

github-actions · 2025-12-02T12:41:55Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

docs/source/tutorials/Qwen3-235B.md

Signed-off-by: xuyexiong <[email protected]>

docs/source/tutorials/Qwen3-Moe.md

Signed-off-by: xuyexiong <[email protected]>

docs/source/tutorials/Qwen3-235B-A22B.md

docs/source/tutorials/Qwen3-Moe.md

leo-pony · 2025-12-04T03:26:17Z

docs/source/tutorials/Qwen3-235B-A22B.md

+--gpu-memory-utilization 0.95 \
+--rope-scaling '{"rope_type":"yarn","factor":4,"original_max_position_embeddings":32768}' \
+--additional-config '{"ascend_scheduler_config":{"enabled":false}}' \
+--compilation-config '{"cudagraph_capture_sizes":[1,4],"cudagraph_mode":"FULL_DECODE_ONLY"}' \


The example we provided represents the best practice under normal circumstances: optimal performance under stable operating conditions. Is this Capature size value a bit too small?

the cudagraph_capture_sizes is set depending on the -max-num-seqs 4. This is an optimal example for 128k sequence inference.

docs/source/tutorials/Qwen3-235B.md

docs/source/tutorials/Qwen3-Moe.md

leo-pony · 2025-12-04T06:25:49Z

docs/source/tutorials/Qwen3-235B-A22B.md

+--quantization ascend \
+--served-model-name qwen3 \
+--max-num-seqs 4 \
+--max-model-len 133000 \


The value of max-model-len should been 131072. I try to run this command but got following error:

(APIServer pid=598) File "/vllm-workspace/vllm/vllm/engine/arg_utils.py", line 994, in create_model_config (APIServer pid=598) return ModelConfig( (APIServer pid=598) ^^^^^^^^^^^^ (APIServer pid=598) File "/usr/local/python3.11.13/lib/python3.11/site-packages/pydantic/_internal/_dataclasses.py", line 121, in __init__ (APIServer pid=598) s.__pydantic_validator__.validate_python(ArgsKwargs(args, kwargs), self_instance=s) (APIServer pid=598) pydantic_core._pydantic_core.ValidationError: 1 validation error for ModelConfig (APIServer pid=598) Value error, User-specified max_model_len (133000) is greater than the derived max_model_len (max_position_embeddings=131072 or model_max_length=None in model's config.json). To allow overriding this maximum, set the env var VLLM_ALLOW_LONG_MAX_MODEL_LEN=1. VLLM_ALLOW_LONG_MAX_MODEL_LEN must be used with extreme caution. If the model uses relative position encoding (RoPE), positions exceeding derived_max_model_len lead to nan. If the model uses absolute position encoding, positions exceeding derived_max_model_len will cause a CUDA array out-of-bounds error. [type=value_error, input_value=ArgsKwargs((), {'model': ...rocessor_plugin': None}), input_type=ArgsKwargs]

Did you add this parameter --rope-scaling '{"rope_type":"yarn","factor":4,"original_max_position_embeddings":32768}'

docs/source/tutorials/Qwen3-235B-A22B.md

Signed-off-by: xuyexiong <[email protected]>

github-actions bot added the documentation Improvements or additions to documentation label Nov 22, 2025

JC-ut0 force-pushed the qwen_doc branch from cf68646 to b0fd8e5 Compare November 22, 2025 09:10

gemini-code-assist bot reviewed Nov 22, 2025

View reviewed changes

docs/source/tutorials/Qwen3-235B.md Outdated Show resolved Hide resolved

docs/source/tutorials/Qwen3-235B.md Outdated Show resolved Hide resolved

docs/source/tutorials/Qwen3-235B.md Outdated Show resolved Hide resolved

1092626063 reviewed Nov 29, 2025

View reviewed changes

docs/source/tutorials/Qwen3-Moe.md Outdated Show resolved Hide resolved

docs/source/tutorials/Qwen3-235B.md Outdated Show resolved Hide resolved

docs/source/tutorials/Qwen3-235B-A22B.md Show resolved Hide resolved

Add Qwen3-235B tutorial

4a7772e

Signed-off-by: xuyexiong <[email protected]>

JC-ut0 force-pushed the qwen_doc branch from 8411f8d to 4a7772e Compare November 29, 2025 09:39

update index.md

c391580

Signed-off-by: xuyexiong <[email protected]>

github-actions bot added the merge-conflicts label Dec 2, 2025

1092626063 reviewed Dec 2, 2025

View reviewed changes

docs/source/tutorials/Qwen3-235B.md Outdated Show resolved Hide resolved

Merge branch 'main' into qwen_doc

2324be5

Signed-off-by: xuyexiong <[email protected]>

github-actions bot removed the merge-conflicts label Dec 3, 2025

update

1208be7

Signed-off-by: xuyexiong <[email protected]>

1092626063 reviewed Dec 3, 2025

View reviewed changes

docs/source/tutorials/Qwen3-Moe.md Show resolved Hide resolved

JC-ut0 added 3 commits December 4, 2025 10:56

update

a6353f1

Signed-off-by: xuyexiong <[email protected]>

update index.md

0ede394

Signed-off-by: xuyexiong <[email protected]>

update

fbd02e1

Signed-off-by: xuyexiong <[email protected]>

leo-pony reviewed Dec 4, 2025

View reviewed changes

menogrey reviewed Dec 4, 2025

View reviewed changes

docs/source/tutorials/Qwen3-Moe.md Outdated Show resolved Hide resolved

docs/source/tutorials/Qwen3-Moe.md Outdated Show resolved Hide resolved

leo-pony reviewed Dec 4, 2025

View reviewed changes

docs/source/tutorials/Qwen3-Moe.md Outdated Show resolved Hide resolved

leo-pony reviewed Dec 4, 2025

View reviewed changes

docs/source/tutorials/Qwen3-235B-A22B.md Show resolved Hide resolved

update

dc0f206

Signed-off-by: xuyexiong <[email protected]>

JC-ut0 changed the title ~~Add Qwen3-235B tutorial~~ [Doc] Add Qwen3-235B tutorial Dec 4, 2025

JC-ut0 added 3 commits December 4, 2025 16:08

update

d15e24e

Signed-off-by: xuyexiong <[email protected]>

update

49e94f5

Signed-off-by: xuyexiong <[email protected]>

update

f683640

Signed-off-by: xuyexiong <[email protected]>

[Doc] Add Qwen3-235B tutorial #4358

Are you sure you want to change the base?

[Doc] Add Qwen3-235B tutorial #4358

Conversation

JC-ut0 commented Nov 22, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions bot commented Nov 22, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Dec 2, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

leo-pony Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

JC-ut0 Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

leo-pony Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

JC-ut0 Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

JC-ut0 commented Nov 22, 2025 •

edited by github-actions bot

Loading

JC-ut0 Dec 4, 2025 •

edited

Loading